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Abstract — Unions of graph multiplier operators are an impor- 
tant class of linear operators for processing signals defined on 
graphs. We present a novel method to efficiently distribute the 
application of these operators. The proposed method features ap- 
proximations of the graph multipliers by shifted Chebyshev poly- 
nomials, whose recurrence relations make them readily amenable 
to distributed computation. We demonstrate how the proposed 
method can be applied to distributed processing tasks such 
as smoothing, denoising, inverse filtering, and semi-supervised 
classification, and show that the communication requirements of 
the method scale gracefully with the size of the network. 

Index Terms — Chebyshev polynomial approximation, denois- 
ing, distributed optimization, learning, regularization, signal 
processing on graphs, spectral graph theory 

I. Introduction 

In distributed signal processing tasks, the data to be pro- 
cessed is physically separated and cannot be transmitted to a 
central processing entity. This separation may be due to engi- 
neering limitations such as the limited communication range 
of wireless sensor network nodes, privacy concerns, or engi- 
neering design considerations. Even when high-dimensional 
data can be processed centrally, for example, it may be more 
efficient to process it with parallel computing. It is therefore 
important to develop distributed data processing algorithms 
that balance the trade-offs between performance, communica- 
tion bandwidth, and computational complexity (speed). 

For concreteness, we focus throughout the paper on dis- 
tributed processing examples in wireless sensor networks; 
however, the problems we consider could arise in a number of 
different settings. Due to the limited communication range of 
wireless sensor nodes, each sensor node in a large network is 
likely to communicate with only a small number of other nodes 
in the network. To model the communication patterns, we can 
write down a graph with each vertex corresponding to a sensor 
node and each edge corresponding to a pair of nodes that 
communicate. Moreover, because the communication graph is 
a function of the distances between nodes, it often captures 
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spatial correlations between sensors' observations as well. 
That is, if two sensors are close enough to communicate, their 
observations are likely to be correlated. We can further specify 
these spatial correlations by adding weights to the edges of 
the graph, with higher weights associated to edges connecting 
sensors with closely correlated observations. For example, it 
is common to construct the graph with a thresholded Gaussian 
kernel weighting function based on the physical distance 
between nodes, where the weight of edge e connecting nodes 
i and j that are a distance d{i,j) apart is 

[ otherwise 

for some parameters a and k. 

We consider sensor networks whose nodes can only send 
messages to their local neighbors (i.e., they cannot communi- 
cate directly with a central entity). Much of the literature on 
distributed signal processing in such settings (see, e.g., |[T|-|I11 
and references therein) focuses on coming to an agreement 
on simple features of the observed signal (e.g., consensus 
averaging, parameter estimation). We are more interested in 
processing the full function in a distributed manner, with each 
node having its own objective. Some example tasks under this 
umbrella include: 

• Distributed denoising - In a sensor network of N sensors, 
a noisy A^-dimensional signal is observed, with each 
component of the signal corresponding to the observation 
at one sensor location. Using the prior knowledge that the 
denoised signal should be smooth or piecewise smooth 
with respect to the underlying weighted graph structure, 
the sensors' task is to denoise each of their components 
of the signal by iteratively passing messages to their local 
neighbors and performing computations. 

• Distributed semi-supervised learning / transductive clas- 
sification - A class label is associated with each sensor 
node; however, only a small number of nodes in the 
network have knowledge of their labels. The cooperative 
task is for each node to learn its label by iteratively 
passing messages to its local neighbors and performing 
computations. 

These and similar tasks have been considered in centralized 
settings in the relatively young field of signal processing on 
graphs. For example, 0-171 consider general regularization 
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frameworks on weighted gi'aphs; lISjl- lfTSll present graph-based 
semi-supervised learning methods; and fT6l-fT9l consider reg- 
ularization and filtering on weighted graphs for image and 
mesh processing. Spectral regularization methods for ill-posed 
inverse problems (see, e.g., EOl and references therein) are 
also closely related. 

Less work has been devoted to such tasks in distributed 
settings; reference II2TI considers denoising via wavelet pro- 
cessing and |22| presents a denoising algorithm that projects 
the measured signal onto a low-dimensional subspace spanned 
by smooth functions. References I123I - I26I consider different 
distributed regression problems. 

Our main contributions in this paper are i) to show that a key 
component of many distributed signal processing tasks is the 
application of linear operators that are unions of graph Fourier 
multipliers or generalized graph multiplier operators (to be 
defined in detail in Sections and [V] respectively); and ii) to 
present a novel method to efficiently distribute the application 
of the graph multiplier operators to high-dimensional signals. 

To elaborate a bit, graph Fourier multiplier operators are 
the graph analog of filter banks, one of the most commonly 
used tools in digital signal processing. Multiplying a signal 
on the graph by one of these matrices is analogous to 
reshaping the signal's frequencies by multiplying it by a 
filter in the Fourier domain in classical signal processing. 
The crux of our novel distributed computational method is 
to approximate each graph Fourier multiplier by a truncated 
Chebyshev polynomial expansion. In a centralized setting, [ 27]| 
shows that the truncated Chebyshev polynomial expansion 
efficiently approximates the application of a spectral graph 
wavelet transform, which is a specific example of a union 
of graph Fourier multipliers. In ||281 . we extend the Cheby- 
shev polynomial approximation method to the general class 
of unions of graph Fourier multiplier operators, and show 
how the recurrence properties of the Chebyshev polynomials 
also enable distributed application of these operators. The 
communication requirements for distributed computation using 
this method scale gracefully with the number of sensors in 
the network (and, accordingly, the size of the signals). In this 
paper, an extended version of |28|, we also generalize graph 
Fourier multiplier operators to generalized graph multiplier 
operators, provide theoretical bounds on the approximation 
error, illustrate the application of our framework in distributed 
smoothing, denoising, inverse filtering, and semi-supervised 
learning tasks, and compare our method to some alternative 
distributed computation methods. 

The remainder of the paper is as follows. In the next section, 
we provide some background from spectral graph theory. In 
Section [Hi] we introduce graph Fourier multiplier operators 
and show how they can be efficiently approximated with 
shifted Chebyshev polynomials in a centralized setting. We 
then discuss the distributed computation of quantities involving 
these operators in Section IV We generalize the framework 



in Section [Vj and provide application examples in Section 



VI In Section VII we compare our proposed method with 



some alternative distributed computation methods. Section 



II. Spectral Graph Theory 

Before proceeding, we introduce some basic notations and 
definitions from spectral graph theory |29l. Throughout, we 
use bold font to denote matrices and vectors, and we denote 
the n*'* component of a vector f by either f{n), (f)„, or /„. 
We model the communication network with an undirected, 
weighted graph Q — {V,£,w}, which consists of a set of 
vertices V, a set of edges £, and a weight function w : £ M+ 
that assigns a non-negative weight to each edge. We assume 
the number of nodes in the network, N — | V|, is finite, and the 
graph is connected. The adjacency (or weight) matrix W for 
a weighted graph Q is the N x N matrix with entries W,„ „, 
where 



W 



w{e), if e £ £ connects vertices m and n 
0, otherwise 



Therefore, the weighted graph Q can be equivalently repre- 
sented as the triplet {V,£,W}. The degree of each vertex 
is the sum of the weights of all the edges incident to it. We 
define the degree matrix D to have diagonal elements equal to 
the degrees, and zeros elsewhere. The non-normalized graph 
Laplacian is then defined as £ := D — W. 

A signal or function / : V — > M.^ defined on the vertices of 
the graph may be represented as a vector f G M^, where the 
component of the vector f represents the function value 



at the n vertex in V. For any f S 



C satisfies 



where m ^ n indicates vertices m and n are connected. 

As the graph Laplacian £ is a real symmetric matrix, it 
has a complete set of orthonormal eigenvectors. We denote 
these by {Xi]i=o 1 n-v These eigenvectors have associ- 
ated real, non-negative eigenvalues {A^ ^ ^ satisfying 
Cxi = ^iXt for t = 0, 1, . . . , — 1. Zero appears as an 
eigenvalue with multiplicity equal to the number of connected 
components of the graph |29|. Therefore, without loss of 
generality, we assume the eigenvalues of the Laplacian of the 
connected graph Q to be ordered as 

= Ao < Ai < A2... < Ajv-i := Amax- 

Just as the classical Fourier transform is the expansion of 
a function / in terms of the eigenfunctions of the Laplace 
operator 



/H = (e'-^/)- / f{x)e 



the graph Fourier transform f of any function f G 

on the vertices of G is the expansion of f in terms of the 

eigenfunctions of the graph Laplacian. It is defined by 



N 



fie) := (x.,f) = ExlW/W, 



(2) 



71=1 



VIII concludes the paper. 



where we adopt the convention that the inner product be 
conjugate-linear in the first argument. The inverse graph 
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Fourier transform is given by 

N-l 

fin) - E f(^)Xi{n). 



(3) 



III. Chebyshev Polynomial Approximation of 
Graph Fourier Multipliers 

In this section, we introduce graph Fourier multiplier opera- 
tors, unions of graph Fourier multiplier operators, and a com- 
putationally efficient approximation to unions of graph Fourier 
multiplier operators based on shifted Chebyshev polynomials. 
All methods discussed here are for a centralized setting, and 



we extend them to a distributed setting in Section IV 



*f = nN 



Fig. I. Application of a union of graph Fourier multiplier operators. 



A. Graph Fourier Multiplier Operators 

For a function / defined on the real line, a Fourier multi- 
plier operator or filter 4' reshapes the function's frequencies 
through multiplication in the Fourier domain: 

^'/(w) = g{uj)f{io), for every frequency lo. 

Equivalently, denoting the Fourier and inverse Fourier trans- 
forms by and J-^^, we have 



1 

2^ 



.gH^(/)Mj(x) 



(4) 



We can extend this straightforwardly to functions defined 
on the vertices of a graph by replacing the Fourier transform 
and its inverse in Q with the graph Fourier transform and 
its inverse, defined in (|2| and (|3]l. Namely, a graph Fourier 
multiplier operator is a linear operator ^ : 
can be written as 



that 



3(A^)/WxKn)- 



E 

^=0 



(5) 



Equivalently, we can write ^ = ^Y^=q 9i^i)XOC'i- We refer 
to (;(•) as the multiplier. A high-level intuition behind (|5]) 
is as follows. The eigenvectors corresponding to the lowest 
eigenvalues of the graph Laplacian are the "smoothest" in the 
sense that |x^(m) — xi{n)\ is small for neighboring vertices 
m and n. At the extreme is Xg, which is a constant vector 
(Xo("^) = Xo("-) for all m and n). The inverse graph Fourier 
transform (|3]l provides a representation of a signal f as a 
superposition of the orthonormal set of eigenvectors of the 
graph Laplacian. The effect of the graph Fourier multiplier 
operator * is to modify the contribution of each eigenvector. 
For example, applying a multiplier g{;) that is 1 for all 
below some threshold, and for all \i above the threshold is 
equivalent to projecting the signal onto the eigenvectors of the 
graph Laplacian associated with the lowest eigenvalues. This 
is analogous to low-pass filtering in the continuous domain. 



Section VI contains further intuition about and examples of 
graph Fourier multiplier operators. For more properties of the 
graph Laplacian eigenvectors, see ll30l and references therein. 



B. Unions of Graph Fourier Multiplier Operators 

In order for our distributed computation method of the next 
section to be applicable to a wider range of applications, 
we can generalize slightly from graph Fourier multipliers 
to unions of graph Fourier multiplier operators. A union 
of graph Fourier multiplier operators is a linear operator 
# : ^ E**^ (77 e {1,2,...}) whose application to a 
function f e can be written as (see also Figure [T]) 



#f = [vi/i;*2;...;*,]f 

= [(vl/if)i;...;(*if)jv;...;(*,f)i; 
- [(*f)i;(*f)2;...;(*f),jv], 



where for every j, *j : M M is a graph Fourier 

multiplier operator with multiplier and 

forje{l,2,...,77}, ne {1,2,...,7V}. 



(6) 



C. The Chebyshev Polynomial Approximation 

Exactly computing $f requires explicit computation of the 
entire set of eigenvectors and eigenvalues of £, which becomes 
computationally challenging as the size of the network, iV, in- 
creases, even in a centralized setting. As discussed in detail in 
II27I Section 6], a computationally efficient approximation $f 
of $f can be computed by approximating each multiplier .gj(-) 
by a truncated series of shifted Chebyshev polynomials. Doing 
so circumvents the need to compute the full set of eigenvectors 
and eigenvalues of £. We summarize this approach below. 

For y e [—1,1], the Chebyshev polynomials 
{7fc(y)}fe=o,i,2,... are generated by 

{1, if A: = 

y, if A; = 1 . 

2yTfe„i(y)-Tfe_2(y), if A: > 2 

These Chebyshev polynomials form an orthogonal basis for 



[-1,1], 



So every function h on [—1,1] that is 



square integrable with respect to the measure dyj \f\ 
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can be represented as h{y) = ^bo + Y^'kLi^kTkiv), where 
{hk]k=o.i,... is a sequence of Chebyshev coefficients that 
depends on h{-). For a detailed overview of Chebyshev 
polynomials, including the above definitions and properties, 
see ||3T1-Il33l. 

By shifting the domain of the Chebyshev polynomials to 

[0, Ainax] via the transformation x — ^^^(y + 1), we can 
represent each multiplier as 

1 ^ 



9o{^) = ^Cj,o + ^ Cj^kTk{x), for all x G [0, A„ 
fc=i 



(7) 



where 



A. 



X — a 
a 



and 



-i{k6) gj[a{cos{e) + 



d6. 



(8) 



For k > 2, the shifted Chebyshev polynomials satisfy 

_ 2 — — 

Tk{x) = -(x - Q;)Tfe_i(a;) - Tk-2{x). 
a 

Thus, for any f e M^, we have 

Tfc(/:)f =-{C- al) (Tfc„i(/:)f) - Tfc_2(/:)f, (9) 

a 

where Tfe(£) e E^><^ and the n*'' element of Tk{C)i is 
given by 

(T,(£)f)^^ ^ T,(A,)/Wx^(n). (10) 

1=0 

Now, to approximate the operator we can approximate 
each multiplier gj{-) by the first K + 1 terms in its Chebyshev 
polynomial expansion Q. Then, for every j e {1, 2, . . . , 77} 
and n e {1,2,..., A^}, we have 



(*f) 

■■= (^c,,of + ^c„feT,(£)f] 

\ k=l / , 



(11) 



Af-1 



|3)JToJ ^ 

-1 

E 



if 



N-l 



k=l 



Icjfi + '^Cj^kTkiXe 



k=l 



mXiin) 



1=0 L 

AT-l 

^ E5.(A.)/Wx£(n) 

To recap, we propose to compute by first computing the 
Chebyshev coefficients {cj^k}j=i,2,...,iy. k=i.2,....K according 
to ([8]), and then computing the sum in ( fTTj l. The computational 
benefit of the Chebyshev polynomial approximation arises 
in ( fTT) from the fact the vector Tk{C)i can be computed 
recursively from Tfe_i(£)f and Tfe_2('C)f according to (|9|. 



The computational cost of doing so is dominated by the 
matrix-vector multiplication of the graph Laplacian C, which 
is proportional to the number of edges, \£\ |27|. Therefore, 
if the underlying communication graph is sparse (i.e., \£\ 
scales linearly with the network size N), it is far more 
computationally efficient to compute $f than $f. Finally, 
we note that in practice, setting the Chebyshev approximation 
order K to around 20 results in # approximating $ very 
closely in all of the applications we have examined. 

IV. Distributed Chebyshev Polynomial 
Approximation 

In the previous section, we showed that the Chebyshev 
polynomial approximation to a union of graph Fourier mul- 
tipliers provides computational efficiency gains, even in a 
centralized computation setting. In this section, we discuss the 
second benefit of the Chebyshev polynomial approximation: it 
is easily distributable. 

A. Distributed Computation of $f 

We consider the following scenario. There is a network of N 
nodes, and each node n begins with the following knowledge: 

• f{n), the n*^ component of the signal f 

• The identity of its neighbors, and the weights of the graph 
edges connecting itself to each of its neighbors 

• The Chebyshev coefficients, Cj^, for j g {1,2, ...,77} 
and fc G {0, 1, 2, . . . , K}. These can either be computed 
centrally according to ([8]) and then transmitted throughout 
the network, or each node can begin with knowledge 
of the multipliers, {5j(-)}j=i. 2, and precompute the 
Chebyshev coefficients according to (|8]l 

• An upper bound on Amax. the largest eigenvalue of the 
graph Laplacian. This bound need not be tight, so we 
can precompute a bound such as Amax < max{(i(m) + 
d{n); m ^ n}, where d{n) is the degree of node n ||34J 

The task is for each network node n to compute 



*f } (12) 

by iteratively exchanging messages with its local neighbors in 
the network and performing some computations. 

As a result of ( [TT| ), for node n to compute the desired 
sequence in ^12) , it suffices to learn {(Tfe(£)f)^}^_^ ^ ^. 

Note that (Ti(/:)f)^ = {^{C-aI){)^ and"/:„,,„' =' 
for all nodes rn that are not neighbors of node n. Thus, to 
compute (Ti(£)f)^^, node n just needs to receive /(m) from 
all neighbors m. So once all nodes send their component of 
the signal to their neighbors, they are able to compute their 
respective components of Ti(£)f. In the next step, each node 
n sends the newly computed quantity (Ti(£)f) to all of 
its neighbors, enabling the distributed computation of T2(£)f 
according to (|9]). The iterative process of local communication 
and computation continues for K rounds until each node n has 
computed the required sequence { (Tfc(£)f)^^}^_^ ^ j^- l" 
all, 2K\£\ messages of length 1 are required for every node n 
to compute its sequence of coefficients in ( [T2| i in a distributed 
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Algorithm 1 Distributed Computation of $f 

Inputs at node n: /„, £„^„j Vm, {cfcj}^^! 2,....r,; k=Q,i,...,K' 
and A,nax 

Outputs at node n: < ( $/ 



Set (To - /„ 

Transmit /„ to all neighbors Afn ■= {ni : Cn,m < 0} 
Receive /„, from all neighbors Afn 
Compute and store 



for A: = 2, . . . , do 



Transmit (Tfe_i(£)f) to all neighbors Afn 
Receive (Tfc_i(£)f)^ from all neighbors Afn 
Compute and store 

2 

a 



-2(Tfc_i(/:)f 



- fc-2 



(^)f), 



end for 

for J e {1,2, 
Output 



,77} do 



($f) 
12: end for 



1 ^ - 



fashion. This distributed computation process is summarized 
in Algorithm 1. 

An important point to emphasize again is that although the 
operator $ and its approximation $ are defined through the 
eigenvectors of the graph Laplacian, the Chebyshev polyno- 
mial approximation helps the nodes apply the operator to the 
signal without explicitly computing (individually or collec- 
tively) the eigenvalues or eigenvectors of the Laplacian, other 
than the upper bound on its spectrum. Rather, they initially 
communicate their component of the signal to their neigh- 
bors, and then communicate simple weighted combinations 
of the messages received in the previous stage in subsequent 
iterations. In this way, information about each component 
of the signal f diffuses through the network without direct 
communication between non-neighboring nodes. 

B. Distributed Computation o/#*a 

The application of the adjoint of the Chebyshev poly- 
nomial approximate operator ^ can also be computed in a 
distributed manner. Let 



a = [ai;a2; . . . ; a^J e 



where a.j E M. . Then it is straightforward to show that 



Algorithm 2 Distributed Computation of $*a 
Inputs at node n: {aj(«)}j=i^2.. ..,»,' -^^ 

and {Cfejlj^i 2,...,r,; fc=0,l,-,^' 

Output at node n: (^^*^ 



for j = 1, 2, . . . , 77 do 

Set (To(/:)a,)„ = a,(n) 
end for 

Transmit {aj{n)} .^-^ ^ to all neighbors Afn 



5: Receive {ajija)} ^ ^ from all neighbors Afn 
6: for j = 1, 2, . . . , do 
7: Compute and store 



(Ti(£)aj)^^ = ^ -Cn,niaj{m) -2aj{n) 
end for 

for fc = 2, do 

Transmit { (Tfc_i(£)aj) ^ }^._^ ^ ^ to all neighbors 



Afn 



11: Receive I (T/j_i(£)aj) } ._ from all neighbors 

Mn 

12: for j — 1,2, . . . ,jj do 
13: Compute and store 

2 

i 

a 



-2(T,_i(/:)a,)„ 



-fe-2 



end for 
end for 

Output 



*^)„ = E Uc..o«.(") + Ec,„. (T,(/:)a,) 1 . 
j=i [ fc=i J 



We assume each node n starts with knowledge of aj{n) for all 
j £ {1,2,..., 77}. For each j G {1,2,..., rj}, the distributed 
computation of the corresponding term on the right-hand side 
of (\3\ is done in an analogous manner to the distributed 
computation of discussed above. Since this has to be done 
for each j, 2K\£\ messages, each a vector of length 77, are 
required for every node n to compute (^*a^) • The distributed 
computation of €>*a is summarized in Algorithm 2. 

C. Distributed Computation o/$*#f 

Using the property of the Chebyshev polynomials that 

Tk{x)Tk'{x) = ^ [Tk+k'{x) + T\k^k'\{x)\ , 



we can write 



E (^c,,oa,+^c,-feT,(/:)a,) . (13) f^*^A ^(^dot + Tdk 

3=1 \ fc=i / „ ^ ^ \^ t^i 



Tfc(/:)f 
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(see II27I Section 6.1] for a similar calculation and an explicit 
formula for the coefficients {dfclfe^o 1 2/f)- Therefore, with 
each node n starting with /(n) as in Section IV- A the 



nodes can compute $*#f in a distributed manner using 
4K\£\ messages of length 1, with each node n finishing with 
knowledge of ($*^f 1 



th 



(c) * and P commute; i.e., *P = P*. 

Proof of Proposition [T]- (a) implies (b) if we set the i 
column of U to Xi-i^ ^nd (b) implies (a) if we set Xi to the 
(e + 1)''* column of U and g{Xi) to the {£ + ly* diagonal 
element of U*'S'U. The equivalence between (b) and (c) is 
shown in 135j Corollary 4.5.18]. ■ 



D. Example: Distributed Smoothing 

Perhaps the simplest example application of the distributed 
Chebyshev approximation method is distributed smoothing 
with the heat kernel as the graph Fourier multiplier. One way 
to smooth a signal y £ is to compute Hty, where, for 
a fixed t, (Hty)(n) Efc"o' e"*^^y(^)Xf(?^)- clearly 
satisfies our definition of a graph Fourier multiplier operator 
In the context of a centralized image smoothing application, 
ifTSJ discusses in detail the heat kernel, Ht, and its relationship 
to classical G aussian filtering. Similar to the example at the 
the main idea is that the multiplier e^*'^* 
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end of Section 

acts as a low-pass filter that attenuates the higher frequency 
(less smooth) components of y. 

Now, to perform distributed smoothing, we just need to 
compute Hty in a distributed manner according to Algorithm 
1, where Ht is the shifted Chebyshev polynomial approxima- 
tion to the graph Fourier multiplier operator Hj. Each node 
starts with an observation yn and finishes with (Hty)„. We 
discuss more complicated application examples in Section [Vl] 



B. Bound on the Approximation Error 

Another interesting question is how closely a graph Fourier 
multiplier (or union of such operators) is approximated by its 
Chebyshev polynomial approximation. The following result, 
which bounds the spectral norm of their difference, is used in 
Section IV^B] 

Proposition 2: Let # be a union of 77 generalized graph 
multiplier operators; i.e., it has the form given in (|6]), where 
{Xi]i=o 1 N-i the eigenvectors of a real positive semi- 
definite matrix P. Let # be the order K Chebyshev polyno- 
mial approximation of Define 



B{K) 



max 



sup 

Ae[0,A„„ 



{|.g,(A)-pf(A)|} 



where Amax is the largest eigenvalue of P, and pf{-) is the 
order K Chebyshev polynomial approximation of gj{ ). Then 



V. Generalized Graph Multiplier Operators 

While defining the graph Fourier transform in terms of 
the eigenvectors of the graph Laplacian preserves a close 
analogy with the classical Fourier transform and leads to 
natural notions of smoothness for signals defined on graphs, 
the distributed computational methods of the previous section 
do not rely on specific properties of the graph Laplacian 
other than its real symmetric positive semi-definite nature. In 
this section, we generalize the definition of a graph Fourier 
multiplier operator by replacing C with any real symmetric 
positive semi-definite matrix. 

A. Definition and Equivalent Characterization of a Graph 
Multiplier Operator 

Definition 1: 'S' is a graph multiplier operator with respect 
to the real symmetric positive semi-definite matrix P if there 
exists a function g : [0, Amax(P)] ~> and a complete set 
{Xi}e=o 1 N-i °f orthonormal eigenvectors of P such that 



II*- *ll 



N-l 

E 



gi>^e)XiXh 



(14) 



where {Afj^^g ^ ^^-i ^re the eigenvalues of P. 

We now provide equivalent characterizations of the class of 
graph multiplier operators with respect to a fixed matrix. 

Proposition 1: The following are equivalent: 

(a) 'S' is a graph multiplier operator with respect to P. 

(b) 'S' and P are simultaneously diagonalizable by a unitary 
matrix; i.e., there exists a unitary matrix U such that 
U**U and U*PU are both diagonal matrices. 



IK* - *)f||9 

max „,^^ < B(K' 

f^O l|f||2 - ^ ' 



TjN. (15) 



The proof of Proposition |2] is included in the Appendix. 

Finally, note that when the multipliers gj{-) are smooth, the 
Chebyshev approximations pf {■) converge to the multipliers 
rapidly as K increases. In particular, the following proposition 
applies. 

Proposition 3 (Theorem 5.14, XJTil ): If gj{-) has M + 1 
continuous derivatives for all j, then B{K) ~ O {K~^^Y 
The Chebyshev polynomial approximation and distributed 



computation methods presented in Section III and IV also 
apply to these generalized graph multiplier operators. This 
generalization significantly extends the applicability of our 
distributed computation method, and is especially useful in 
distributed applications where we can first choose the eigenba- 
sis to use in the transform, and then design the communication 
graph accordingly. 



VI. Illustrative Applications 

In this section, we provide more detailed explanations of 
how our novel distributed approximation framework can be 
used in the context of a diverse set of distributed signal 
processing tasks in sensor networks. To be clear, our contribu- 
tion here is not to suggest new signal processing techniques, 
but simply to show how a subset of the existing centralized 
techniques can be efficiently implemented in a distributed 
manner. 
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A. Denoising with Distributed Tikhonov Regularization 

In this section, we consider the distributed denoising task 
discussed in Section |l] To recall, we start with a noisy signal 
y G that is defined on a graph of N sensors and has been 
corrupted by uncorrected additive Gaussian noise. Through 
an iterative process of local communication and computation, 
we would like each sensor to end up with a denoised estimate 
of its component, /°, of the true underlying signal, f*^. 

To solve this problem, we enforce a priori information that 
the target signal is smooth with respect to the underlying graph 
topology. To enforce the global smoothness prior, we consider 
the class of regularization terms P£''f for r > 1. The resulting 
distributed regularization problem has the form 

argmin^llf-yll^ + f^rT. (16) 
f ^ 

To see intuitively why incorporating such a regularization term 
into the objective function encourages smooth signals (with 
r = 1 as an example), note that F£f = if and only if f is 
constant across all vertices, and, more generally, 

1 
2 



n^V rri~n 



,(/(m)-/(n))^ 



so P£f is small when the signal f has similar values at 
neighboring vertices with large weights (i.e., it is smooth). 

We now show how our novel method is useful in solving 
this distributed regularization problem. 

Proposition 4: The solution to ( [T6] l is given by Ry, where 
R is a graph Fourier multiplier operator of the form Q, with 
multipHer g{Xi) = :^r^ 

The proof of PropositionW] is included in the Appendix. 

So, one way to do distributed denoising is to compute 
Ry, the Chebyshev polynomial approximation of Ry, in a 
distributed manner via Algorithm 1 . We show this now with a 
numerical example. We place 500 sensors randomly in the 
[0, 1] X [0, 1] square. We then construct a weighted graph 
according to the thresholded Gaussian kernel weighting ([T]) 
with a — 0.074 and k — 0.600, so that two sensor nodes 
are connected if their physical separation is less than 0.075. 
We create a smooth 500-dimensional signal with the 
component given by /,j = + — 1, where 
node n's x and y coordinates in [0, 1] x [0, 1]. One instance of 
such a network and signal f ° are shown in Figure |2] and the 
eigenvectors of the graph Laplacian are shown in Figure [3] 

Next, we corrupt each component of the signal f*^ with 
uncorrected additive Gaussian noise with mean zero and stan- 
dard deviation 0.5. Then we apply the graph Fourier multiplier 
operator R, the Chebyshev polynomial approximation to R 
from Proposition |4] with t — r = 1 and K = 15. The 
multiplier and its Chebyshev polynomial approximations are 
shown in Figure [4j and the denoised signal Ry is shown in 
Figure |5] We repeated this entire experiment 1000 times, with 
a new random graph and random noise each time, and the 
average mean square error for the denoised signals was 0.013, 
as compared with 0.250 average mean square error for the 
noisy signals. 

'This filter g{Xe) is tlie graph analog of a first-order Bessel filter from 
classical signal processing of functions on the real line. 



and Uy are 




Fig. 2. A network of 500 sensors placed randomly in the [0, 1] X [0, 1] 
square. The background colors represent the values of the smooth signal f". 



Xq 





(d) 



Fig. 3. Some eigenvectors of the Laplacian of the graph shown in Figure 
[2] The blue bars represent positive values and the black bars negative values, 
(a) Xo' 'he constant eigenvector associated with Aq = 0. (b) Xi. the Fiedler 
vector associated with the lowest strictly positive eigenvalue, nicely separates 
the graph into two components, (c) X2 ^1^° ^ smooth eigenvector, (d) X50 
is far less smooth with some large differences across neighboring nodes. 



B. Denoising with Distributed lasso 

We now consider an alternative method of distributed de- 
noising that is better suited to situations where we start with 
a prior belief that the signal is not globally smooth, but 
rather piecewise smooth, which corresponds to the signal being 
sparse in the spectral graph wavelet domain fTl]. 

The spectral graph wavelet transform, H, defined in |27|, 
is precisely of the form of 4> in (|6]). Namely, it is composed 
of one multiplier, h{-), that acts as a low-pass filter to stably 
represent the signal's low frequency content, and J wavelet 
operators, defined by gj{Xi) — g{tjXi), where {ij}j=i,2,....j 
is a set of scales and g(-) is the wavelet multiplier that acts 
as a band-pass filter 

The most common way to incorporate a sparse prior in a 
centralized setting is to regularize via a weighted version of the 
least absolute shrinkage and selection operator (lasso) l,36J . 
also called basis pursuit denoising M37I : 



argmm -||y 

a ^ 



(17) 
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Algorithm 3 Distributed lasso 



Fig. 4. The regularizing multiplier ^^^^^ associated with the graph 
Fourier multiplier operator R from Proposition |4] Here, r = t = 1. 
Shifted Chebyshev polynomial approximations to the multiplier are shown 
for different values of the approximation order K. 



Original Signal 



Noise 






(b) 

Denoised Signal 




(d) 



(c) 

Fig. 5. A denoising example on the 
regularizing multiplier shown in Figure 

where Ux and Uy are the x and y coordinates of sensor node n. (b) The 
additive Gaussian noise, (c) The noisy signal y. (d) The denoised signal Ry. 



jgraph shown in Figure [2| using the 
4 (a) The original signal rv^ + riy — 1, 



where |la|li_^ := l^i 1^*1 > for all i. The 

optimization problem in ^Tl) can be solved for example by 
iterative soft thresholding |[38l . The initial estimate of the 
wavelet coefficients a'^^ is arbitrary, and at each iteration of 
the soft thresholding algorithm, the update of the estimated 
wavelet coefficients is given by 



(/3-1) 



+ 7^ 



z = l,2,...,7V(J+l); /3-l,2, 



(18) 



where 7 is the step size and 
thresholding operator 



is the shrinkage or soft 



sgn(z)^j7 



if \z\< /i,7 
o.w. 



The iterative soft thresholding algorithm converges to a*, the 
mmimizer of ([17]), if 7 < [39|. The final denoised 

estimate of the signal is then given by H*a*. 

We now turn to the issue of how to implement the above 
algorithm in a distributed fashion by sending messages be- 
tween neighbors in the network. One option would be to 



Inputs at node n: ?/„, Vm, {ck,j}j^i^2,...,j+i; k=o,i,...,K' 

Amax, 7, and {fJ'U-l)N+n} j^i 2....,J+l 

Outputs at node n: t/„*, the denoised estimate of /° 
1: Arbitrarily initialize <^ (a*^*^^),. > 

^ U '^0-l)^+"ij = l,2,...,J+l 

2: Set /? = 1 

3: Compute and store < (Hy) > 

tV /(i-l)Af+nJ j-^i,2,...,J+l 

via Algorithm 1 
4: while stopping criterion not satisfied do 

5: Compute and store < ( HH*a''^~^) ] 

Aj-1)A^+"J j = l,2,...,J+l 

via Algorithm 2, followed by Algorithm 1 
6: for j = 1,2, . . . , J + 1 do 
7: Compute and store 



(a(«) 

V / (j-l)N+n 



= 5, 



\ 



-7 (HH*a(^-i)' 



U-l)N+7i 



end for 

Set 13^ 13 + 1 
end while 

for j = 1,2,..., J+1 do 

Set (a*)(j_;^)^_|_„ 
end for 

Compute and store y„» = ( W*a» ) via Algorithm 2 
Output Un* 



= fa(«^ 



{j-l)N+ii 



use the distributed lasso algorithm of 11251 . 11261 . which is a 
special case of the alternating direction method of multipliers 
t40i p. 253]. In every iteration of that algorithm, each node 
transmits its current estimate of all the wavelet coefficients to 
its local neighbors. With the spectral graph wavelet transform, 
that method requires 2\£\ total messages at every iteration, 
with each message being a vector of length N{J + 1). A 
method where the amount of communicated information does 
not grow with N (beyond the number of edges, \£\) would be 
highly preferable. 

The Chebyshev polynomial approximation of the spectral 
graph wavelet transform allows us to accomplish this goal. 
Our approach, which is summarized in Algorithm 3, is to ap- 
proximate H by H, and use the distributed implementation of 
the approximate wavelet transform and its adjoint to perform 
iterative soft thresholding in order to solve 



argmm 



1, 



y-3*a||2 



(19) 



In the first soft thresholding iteration, each node n must 
learn (Hy)(j_i)jv-|_„ at all scales j, via Algorithm 1. These 
coefficients are then stored for future iterations. In the 
iteration, each node n must learn the J + 1 coefficients 
of HH*a(^~^' centered at n, by sequentially applying the 
operators H* and H in a distributed manner via Algorithms 
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2 and 1, respectively. When a stopping criterion for the soft 
thresholding is satisfied, the adjoint operator H* is applied 
again in a distributed manner to the resulting coefficients a*, 
and node n's denoised estimate of its signal is ^H*a*^ . The 
stopping criterion may simply be a fixed number of iterations. 



or it may be when 



< e for all 



n and some small e. Finally, note that we could also optimize 
the weights fi by performing distributed cross-validation, as 
discussed in f25l, f26\. 

We now examine the communication requirements of this 



approach. Recall from Section IV that 2K\£\ messages of 
length 1 are required to compute Hy in a distributed fashion. 
Distributed computation of HH*a^^^^'', the other term needed 
in the iterative thresholding update ([T8|, requires 2/'ir|£| mes- 
sages of length J+1 and 2A'|£| messages of length 1. The final 
application of the adjoint operator H* to recover the denoised 
signal estimates requires another 2K\£\ messages, each a 
vector of length J + 1. Therefore, the Chebyshev polynomial 
approximation to the spectral graph wavelet transform enables 
us to iteratively solve the weighted lasso in a distributed man- 
ner where the communication workload only scales with the 
size of the network through \£\, and is otherwise independent 
of the network dimension N. 

The reconstructed signal in Algorithm 3 is H*a*, where a* 
is the solution to the lasso problem ([T9|. A natural question is 
how good of an approximation H*a* is to H*a*, where a* is 
the solution to the original lasso problem ( [T7] i. The following 
proposition bounds the squared distance between these two 
quantities by a term proportional to the spectral norm of the 
difference between the exact and approximate spectral graph 
wavelet operators. 

Proposition 5: ||H*a* -H*a* < C|||H-H|||2, where |||-|||2 
is the spectral norm, and the constant C — J^^^. 

Combining Proposition |5] whose proof is included in the 
Appendix, with ( [T5| l, we have 



< 



|y|li 



B{K)^N{J+l) 



(20) 



Thus, as we increase the approximation order K, B{K) and 
the right-hand side of ( |20] i tend toward zero (at a speed 
dependent on the smoothness of the graph wavelet multipliers 
g{-) and /»(•)). 

Finally, to illustrate the distributed lasso, we again consider 
a numerical example where we place 500 sensors randomly in 
the [0, 1] X [0, 1] square, with the same graph construction as 



the numerical example in Section VI-A This time, however, 
the underlying signal is only piecewise-smooth, with the n*'' 
component given by 



f = 

J n 



-2nr, 



0.5, 



0.5, 



if rij, > 1 
if rij, < 1 



We again corrupt each component of the signal f with uncor- 
rected additive Gaussian noise with mean zero and standard 
deviation 0.5. We then solve problem ([T9| in a distributed 
manner using Algorithm 3. We use a spectral graph wavelet 
transform with 6 wavelet scales, with the kernels automatically 
designed by the spectral graph wavelets toolbox 1411. In 



Algorithm 3, we run 300 soft thresholding iterations and take 
7 — 0.2, fii = 0.75 for all the wavelet coefficients, and 
fii = 0.01 for all the scaling coefficients |^ We do not perform 
any distributed cross-validation to optimize the weights fi. 
We repeated this entire experiment 1000 times, with a new 
random graph and random noise each timej^ The average 
mean square errors were 0.250 for the noisy signals, 0.098 
for the estimates produced by the Tikhonov regularization 
method ([T6|, 0.088 for the denoised estimates produced by the 
distributed lasso with the exact wavelet operator, and 0.079 for 
the denoised estimates produced by the distributed lasso with 
the approximate wavelet operator with K — 15. Note that the 
approximate solution does not necessarily result in a higher 
mean square error than the exact solution. 

C. Distributed Inverse Filtering 

We now consider the situation where node n observes the 
n*'* component of y = yff + i/, where 'S' is a graph Fourier 
multiplier operator with multiplier <?*(•), and i/ is uncorrected 
Gaussian noise. The task of the network is to recover f by 
inverting the effect of the graph multiplier operator Nf. This 
is the distributed graph analog to the deblurring problem in 
imaging, which is discussed in ll42l Chapter 7]. As discussed 
in 1 42, Chapter 7], trying to recover f by simply applying the 
inverse filter in the graph Fourier domain, i.e., setting 



N-l 



1 



= E ( m + ) Xdn) 



N-l 



- fin) + E 



m 



(21) 



does not work well when .g*(-) is zero (or close to zero) for 
high frequencies, because the summation in ( |2T| ) blows up, 
dominating f{n). Therefore, we again use the prior that the 
signal is smooth with respect to the underlying graph structure, 
and approximately solve the regularization problem 



argmm -||y 
f ^ 



*f|l2 



(22) 



in a distributed manner. 

Proposition 6: The solution to (|22]i is given by Ry, where 
R is a graph Fourier multiplier operator with multiplier 



Tgl(Xe) + 2Xl- 



The proof of Proposition |6] is included in the Appendix. 
From Proposition |6] we conclude that one method to perform 
distributed inverse filtering is to compute Ry, the Chebyshev 
polynomial approximation to Ry, in a distributed manner 
according to Algorithm 1. 

-The scaling coefficients in the spectral graph wavelet transform are not 
expected to be sparse. 

^The reported errors are averaged over the 441 random graph realizations 
that were connected. 
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D. Distributed Semi-Supervised Classification 

The goal of semi- supervised classification is to learn a 
mapping from the data points X = {xi, a;2, . . . , xjv} to 
their corresponding labels Y = {yi, ?/2, ■ • • , yw}- The pairs 
{xi, Hi) are independently and identically sampled from a joint 
distribution p{x, y) over the sample space X x y, where 
3^ := {1, 2, . . . , k} is the space of k classes. The transductive 
classification problem is to use the full set of data points 
X = {xi,X2, ■ ■ ■,xn} and the labels Yi = {yi,y2, ■■■,yi} 
associated with a small portion of the data (I << N) to 
predict the labels Ya — {yi+i,yi+2, ■ ■ ■ , yw} associated with 
the unlabeled data Xu — {xj+i, a;/_|_2, • ■ • ,xn}- 

Many semi-supervised learning methods represent the data 
X by an undirected, weighted graph, and then force the 
labels to be smooth with respect to the intrinsic structure 
of this graph. We show how a number of these centralized 
graph-based semi-supervised classification methods can be 
distributed using Chebyshev polynomial approximation of 
generalized graph multiplier operators. Throughout the sec- 
tion, we assume there is one data point at each node in the 
graph, and the nodes know the weights of the edges connecting 
them to their neighbors in the graph. For example, each data 
point could be at a different node in a sensor network, and the 
weights could be a function of the physical distance between 
the nodes, as in ([TJ. 

1) Centralized Graph-Based Methods: For different 
choices of reproducing kernel Hilbert spaces (RKHS) H, a 
number of centralized semi-supervised classification methods 
estimate the label of the n*'' data point (n E {I + 1, . . . , N}) 
by 



arg max F°^* 
je{i,2,...,K} 



(23) 



where F°p* is the solution to 



F-^"* = argmin ^ {r||F:,, - Y..jl + \\F..jl,} . (24) 



In p4l ). A: j denotes the j column of a matrix A; Y is a 
N X K matrix with entries 



Y. 



1, if i € {1, 2, . . . ,1} and the label for point i is j 
0, otherwise 



and for some symmetric positive semi-definite matrix P e 

mNxN 



i\\l = {i,i)n ■■= (f,Pf)=FPf. 



(25) 



Note that for any symmetric positive semi-definite matrix 
P, H endowed with the inner product defined in p5| ) is in 
fact a RKHS on PM^, and its kernel is k{ij) = (p-^)^^., 
where P^^ denotes the pseudoinverse if P is not invertible 
E Theorem 4]. 

We now give some examples of graph-based centralized 
semi-supervised classification methods that fall into this cate- 
gory. 

• In Tikhonov regularization, P — (e.g., fTO)) 
. Zhou et al. IDl take P = i^n^rm^ where Cnorm := 
D~2 £D~3, and also consider a variant with P = CD^ 



Smola and 
methods 



exp 1 2£ 
cos 

(ctIat - C 



Kondor 15] consider a variety of 
a diffusion process with 

with P 



including 



kernel 

P = 



an inverse cosine 



0]; 

rifl 



, and an r-step random walk with P = 
where a >2 and Ijy is the x 



identity matri 

Zhu et al. [13, Chapter 15] take the kernel approach a 
step further by solving a convex optimization problem to 
find a good P 

Ando and Zhang's K-scaling method lfT4ll . ifTsll takes 



(7ljv + D)- 



when 7 = 



D) 



which reduces to Cnorm 

Before moving on to the distributed semi-supervised clas- 
sification problem, we note two important properties of the 
matrices P used in each of these above methods: 1) They 
have the same sparsity pattern as £, and 2) they are easily 
computable from C. 

2) Distributing the Centralized Graph-Based Methods: 
Now, F°P'^ in ( p4| i can be equivalently rewritten as the solution 
to K separate minimization problems, with 



F°^* = argmin {r||f-Y,,||2 
feR" 



rpf} 



(26) 



If P = X)fco^ 9{^e)XiX4 for some g{-), then to solve ( |26] l, 
we can simply compute RY: j, where R is a graph Fourier 
multiplier operator with multiplier ^qr^^p^. Otherwise, p6| ) 
is essentially of the form of ([16]), with a different choice of 
multiplier basis. That is, we can write the solution to (|26| as 



2) 
3) 



RY. J, where R is a generalized graph multiplier operator of 
the form ( [T4] l, with respect to P. The multiplier is ^^Xt ■ 

Therefore, the following is a method to distribute any of the 
centralized semi-supervised classification methods that can be 
written as ( |23] l and ( |24] i: 

1) Node n starts with or computes the entries of n*'* row 
of P (which are easily computable from C in the cases 
mentioned above) 

Each node n forms the n*^ row of Y 
For every j £ {1, 2, . . . , k}, the nodes compute F°^* :— 
RY. J in a distributed manner via Algorithm 1 (with P 
replacing C) 

4) Each node n with an unlabeled data point computes its 
label estimate according to argmax^gj]^ 2 «} 
By Propositions |2] and [3] as we increase the Chebyshev approx- 
imation order K, the classification results of this distributed 
method converge to results of the corresponding centralized 
semi-supervised learning method. 

VII. Comparison with Other Distributed 
Processing Methods 

In this section, we compare our proposed method with two 
variations of Jacobi's iterative method. These variations are 
not distributed computation methods per se, but rather central- 
ized computation methods that can be easily parallelized and 

^All three of these P matrices can be written as P = 9(^e)Xi'X^ 
for some where {xi}i—q 1 jv-i ^'^^ ''^^ eigenvectors of Cnorm- 
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that we deem most appropriate for the types of applications 
mentioned above. We have not included the Gauss-Seidel 
or conjugate gradient methods, for example, because of the 
extra synchronicity considerations they would require in a 
distributed implementation. 

A. Jacobi's Iterative Method 

For P = Cnormi Zhou et al. ifTTl propose to solve 
through the iteration 

p(t+i) 



1 



1 



I AT 



P) F(*) + rY 



(27) 



where F'^^ is arbitrary (they set it to Y)j^The iteration ( p7| ) 
is in fact just a particular instance of Jacobi's iterative method 
(see, e.g., Ii43. Chapter 4]) to solve the set of linear equations 

- p) F°?'* = tY. 



(rl 



N 



(28) 



We can generalize the application of the Jacobi method to 
solve p6l ) for the other matrices P mentioned above, as 
follows. We write P = Pj^ — Pq, where P^ is a diagonal 
matrix with diagonal entries equal to the diagonal entries of 
P, and Po has zeros on the diagonal, and off-diagonal entries 
equal to the negative of the off-diagonal entries of P. Then 
the Jacobi iterative method to solve ( |28] l is given by 

(29) 



F(*+i) = (rljv + VdV^ [PoFW + ty' 



I 



N ■ 



Note that for P = Cnorm, Pd — ^n and Pf 
so (|29| reduces to ( |27] i. 

So one alternative distributed semi-supervised classification 
method is to compute the iterations (|29| in a distributed 
manner, with each node starting with knowledge of its row 
of P and Y. In fact, the communication cost of one iteration 
of p9l ) is the same as the communication cost of one iteration 
of the distributed computation of RY (lines 6 and 7 of 
Algorithm 1). We present a numerical example comparing the 



convergence rates of these methods in Section VII-C 



For graph multiplier operators whose multipliers have the 
property g(Xi) ^ for all i, we can generalize the Jacobi 
method as follows. Suppose we wish to compute Ry, where 
R is a graph multiplier operator with respect to P and with 
multiplier g{-). This is equivalent to solving the linear system 
of equations 



Xixl f = y- 



Define the matrix Q Efco^ -^(j^XiX* and let Q = Qd - 
Qo, where Qjj is the diagonal matrix with diagonal entries 
equal to those of Q. Then the Jacobi iteration is 




(30) 



However, one immediate drawback of Jacobi's method, as 
compared with our proposed method, is that it does not always 
converge. The iterations in ([30| converge for any x'*'^ if and 
only if the spectral radius of Q^^Qo is less than one 

^In 1131 Chapter 11], similar iterative label propagation methods from 
and 1121 are also compared with the method of llll . 



Theorem 4.1]. One sufficient condition for the latter to be 
true is that Q is strictly diagonally dominant, as is the case 
for example when P = £ and g{Xg) 



T+\e 



B. Jacobi's Iterative Method with Chebyshev Acceleration 

When Jacobi's method does converge, we can accelerate 
( |30l ) using the following algorithm Ii44. Algorithm 6.7]. Let 
p be an upper bound on the spectral radius of Qj^^Qo- and 
define ^(o) 1, ^(i) := p, and x^^) := Q^^Qox^") + Q^V- 
Then for i > 1, let 



(t+i) - 



1 



,(t+i) 



2 _ 

2^(t+i) 



, and 



D y- 



(31) 



To distribute ( |3T| l, each node n must first learn Q„„ and 
the n*'* row of Qo- For example, when P Cnorm and 



T + \. 



, as in 



(127]), Qr, 



for all n, and the n' 



row of Qo is just — - times the n*'' row of Cnorm, which 
the nodes can easily compute in a distributed manner An 
additional challenge in a distributed setting may be to calculate 
the bound p. 

Finally, note that while this method and our method share 
the same namesake, the use of the Chebyshev polynomials 
in the two is different. We use Chebyshev polynomials to 
approximate the multiplier, whereas this method improves the 
convergence speed of the Jacobi method by using Chebyshev 
polynomials to choose the weights it uses to form the iterates 
in ( (3T| as weighted linear combinations of the iterates in ([30|. 
See Section 6.5.6 of 144 J for more details. 

C. Numerical Comparison 

An in-depth comparison of the convergence conditions and 
rates of these methods with our proposed method is beyond 
the scope of this paper, but we compare their convergence 
rates here on a few simple numerical examples. We con- 
sider the same random graph shown in Figure |2j and we 
generate a signal f on the vertices of the graph with the 
components of f independently and identically sampled from 
a uniform distribution on [—10, 10]. For different choices of 
positive semi-definite matrices P G jjsooxsoo^ define 
y (I500 + :f P) with r = 0.5. Then, starting with 
y, we iteratively compute an approximation to f in three 
different distributable ways: 1) Ry, where R is the Chebyshev 
approximation to R, a generalized graph multiplier operator 
with respect to P, and with multiplier ^q^; 2) with the Jacobi 
iteration ( p9| l; and 3) with the Jacobi iteration with Chebyshev 
acceleration ( (3T] i. Since the communication requirements of 
our method with Chebyshev approximation order K are equal 
to the communication requirements of K iterations of the latter 
two methods, we plot the errors |jf'^^) — f||2 (where f^^^ 
corresponds to Ry with an order K approximation in the 
first case or the result of the if*'' iteration in the latter two 
cases) on the same axes in Figure |6] We repeat the experiment 
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- Our distributed Chebyshev approximation method 

- ■ Jacobi method 

- Jacobi method with Ghebyshe 



Approximation Order 



20 30 40 50 60 

Iterations / Approximation Order 



P — (2Ijv ^norn 



— Our distributed Chebyshev abprsximatibh tr 
- ' Jacobi trelhcd 

- Jacobi tretbod with Chebyshev acceietatior 



id Chebyshev approxitaalion iv 
- Jacobi teethed with Chebyshev acceietatior 



Iterations / Approximation OrriGr 



10 20 30 40 50 

Iterations / Approximation Order 



Fig. 6. Thi'ee different distributed metliods to approximately compute Ry, 
wliere R, is a grapli multiplier operator with respect to P for different choices 
of P. In all cases, the multiplier is g{Xi) = . The error shown is 

||f(if) _f||2, where f(^^) is either Ry with an order K approximation for 
our distributed Chebyshev approximation method, or the result of the K^^ 
iteration for the two Jacobi methods. 



with P 



£D ^, and the three-step random walk 

3 



r 

process (2I500 — Cnorm)^'^ , for which the Jacobi method does 
not converge. In these experiments, not only does our proposed 
method always converge, but it converges faster than the 
alternative methods. 



VIII. Concluding Remarks 

We presented a novel method to distribute a class of linear 
operators called unions of graph multiplier operators. The main 
idea is to approximate the graph multipliers by Chebyshev 
polynomials, whose recurrence relations make them readily 
amenable to distributed computation. Key takeaways from the 
discussion and application examples include: 

• A number of distributed signal processing tasks can be 
represented as distributed applications of unions of graph 
multiplier operators (and their adjoints) to signals on 
weighted graphs. Examples include distributed smooth- 
ing, denoising, inverse filtering, and semi-supervised 
learning 

• Graph Fourier multiplier operators are the graph analog 
of filter banks, as they reshape functions' frequencies 
through multiplication in the Fourier domain 

• The amount of communication required to perform the 
distributed computations only scales with the size of the 
network through the number of edges of the communica- 
tion graph, which is usually sparse. Therefore, the method 
is well suited to large-scale networks 

• The approximate graph multiplier operators closely ap- 
proximate the exact operators in practice, and for graph 
multiplier operators with smooth multipliers, an upper 
bound on the spectral norm of the difference of the 



approximate and exact operators decreases rapidly as we 
increase the Chebyshev approximation order 
In addition to considering more applications, our ongoing 
work includes analyzing robustness issues that arise in real 
networks. For instance, we would like to incorporate quan- 
tization and communication noise into the sensor network 
model, in order to see how these propagate when using the 
Chebyshev polynomial approximation approach to distributed 
signal processing tasks. It is also important to analyze the 
effects of a sensor node dropping out of the network or 
communicating nodes losing synchronicity to ensure that the 
proposed method is stable to these disturbances. 



IX. Appendix 

Proof of Proposition [2j For j — 
1,2,. ..,7V, 



1,2,. 



, 7] and n 



Af-1 

E 

1=0 



{gjiX,)~pfiXe))fi£)xAn) 
< B{K) IfWXii^ 



<B{K) 

< Wl|f|i2, 




(32) 



(33) 



where ( |32| l follows from Holder's inequality, and ( [33) follows 
from the Parseval relation (just after equation (49) in lIZTll ) and 
footnote 3 in 1271. Then 



|(*-*)f||2 



■q N 



{j-l)N+n 



V N 



j=i t=\ 



\ 



BWIIflUV^iV 



(34) 



Rearranging ( [34| l yields ( |T5| I. ■ 
Proof of Proposition^^ The objective function in ([T6| is 
convex in f . Differentiating it with respect to f , any solution 
L to 



,(f* 







(35) 



is a solution to ([T6|jj Taking the graph Fourier transform of 
( |35| ) yields 



V£ e {0, 1, 



0, 



(36) 



!}• 



From the real, symmetric nature of £ and the definition of the 
Laplacian eigenvectors {Lxi — A^x^), we have: 

Cn,{() = x^C^^U - (r'x,)* f* - = (37) 

*In the case r = 1, the optimality equation j35j corresponds to the 
optimality equation in [17 Section III- A] with p = 2 in that paper. 
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Substituting ( (37] i into (|36]l and rearranging, we have 



2\] 



V^e{0,i,...,iV-l}. 



(38) 



Finally, taking the inverse graph Fourier transform of ( |38| l, we 
have 



N-1 



U{n)=Y^U{t)Xi{n)=Y. 



£=0 



i=0 



r + 2A^_ 
Vn e {1,2,...,N}. 



Proof of Proposition^ The solutions a* to ( [T7] i and a* 
to ([T9]l are not unique; however, their images H*a* and H*a* 
are unique. To see this, for example for H*a*, we can write 
([TtJ equivalently as 



argmin -|ly-b||2 + |la||i,^ 

a,b ^ 



S.t. 



b = H*a. 



Then by the strict convexity of ||-||2, the convexity of 
and Lemma [T] below, H*a» is unique. 

Lemma 1: Let /i : M" ^ M be strictly convex, /2 : M™ 
M be convex, and A e M"^™. Then the solution (x*,y*) to 



argmin /i(x) + /2(y) 

x6K", ygR"' 



(39) 



s.t. 



X = Ay 



is unique with respect to x* (but not necessarily y*). 

Proof of Lemma^ Let (xi,yi) and (x2,y2) be in the 
set ( |39l ), and assume xi ^ X2. Then by linearity, (xa.ya) 
|(xi,yi) + ^(x2,y2) satisfies X3 = Ayg, and by the strict 
convexity of /i(-) and convexity of f2{-), 

/l(x3) + /2(y3) < ^/l(xi) + ^/l(x2) + ^/2(yi) + ^/2(y2) 

= , mill /i(x) + /2(y), 

{xeR", yeK": x=Ay} 

which is a contradiction. Thus, xi = X2. ■ 
It follows from the first-order necessary and sufficient 
optimality equations of the lasso problem (see, e.g., ff39l 
Proposition 5.3(iv)]) that for all a e M^^-^+i), we have 

(y-H*a,,H*a-H*a,) + ||a,||i,^< llalli,;,, (40) 



and similarly 



(y-H*a*,H*a-H*a 



|a*|li.^<lla|li„ 



(41) 



Taking a = a* in (|40| and a = a* in ( |4T] ), summing (|40| and 
( |4T] l, and rearranging, we have 



(y £1 a,t , ii a,t u a,^ 



(y-H*a*,H*a* - H*a* 



= ||y - a a,||2 + (y - a a,, a a, - y) 

+ ||y-H*a,||2 + (y-H*a,,H*a, -y) <0. (42) 



Then 



^*ll2 

|2 
2 



lly-H*&* 



l2-2(y-H*a„y-H*a, 



= l|y-3*a, 

f (y - H*a„ (H* - H*)a,) + (y - H*a„ (H* - H*)a,) 
< ||y-H*a,||2 |||H*-H*|||2 ||a,||2 

+ ||y-H*a,||2 |||H*-H*|||2 ||a,||2 (43) 
<||y||2 |||H-H|||2(||a,||2 + ||a,||2), (44) 



where ( |43| l follows from the Cauchy-Schwarz inequality, and 
(|44]) follows from the facts that |||A*|||2 = |||v4|||2 |35» P- 309], 
and ||y-H*a,||2 < ||y||2 and ||y - H*a,||2 < ||y||2 by 
the optimality of a* and a*, and the feasibility of a = 0. 
Finally, by the uniqueness of H*a*, ||a*||i^^ is the same for 
all solutions a*, and 



mm /i 



}« 



a*||2 



< lla^lli,;. < -||y 



II2 



||a*||i,;, < -||y||^, 



(45) 



where the last inequality again follows from feasibility of a = 
0. The bound in ( |45| ) also holds for {min^ /i^} ||a* II2, and 
substituting these into ( |44] l yields the desired result. ■ 
Proof of Proposition^ As in Proposition]?] the objective 
function in ( ]22] i is convex in f . Differentiating it with respect 
to f, any solution to 



is a solution to (J22|. Note that 
and 



(46) 



(47) 



(48) 



Therefore, taking the graph Fourier transform of ( |46l ) and 
substituting in (]37]i, (]47]i, and (]48]l yields 



Tg%{\i) + 2X 



-m, v^e{0,i,...,iv-i}. 
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